# Parallel HTML Parsing with HSV

Proof-of-concept demonstrating parallel parsing of HTML-like structured documents using HSV format.

## Background

HTML parsing has been sequential for 30 years. Research attempts achieved limited results:

| Project | Year | Approach | Result |
|---------|------|----------|--------|
| HPar | 2013 | Speculative data-parallel | 2.4x on 4 cores |
| ZOOMM | 2013 | Parallel browser engine | 2x (whole engine) |
| Servo | 2017 | Off-main-thread parsing | Tokenization only |

HSV solves this by changing the representation, not the parser.

## How It Works

1. **Represent HTML as HSV** - use control characters instead of angle brackets
2. **Split at delimiters** - O(n) scan for FS (record separator)
3. **Parse chunks in parallel** - no state synchronization needed
4. **Reconstruct** - results are independent, just collect them

## Run Tests

```bash
go test -v
```

## Run Benchmarks

```bash
go test -bench=. -benchmem
```

## Results

```
Size    Chunks  Sequential      Parallel
----    ------  ----------      --------
100     100     68µs            77µs
500     500     360µs           349µs
1000    1000    646µs           637µs
2000    2000    1.45ms          1.40ms
```

Parallel wins at ~500+ elements. For real HTML processing (DOM building, rendering), the advantage would be larger.

## Key Points

- **No escaping**: `<div>`, `&`, `"quotes"` preserved literally in HSV
- **Trivial parallelization**: ~50 lines of code
- **Verified correctness**: Sequential and parallel produce identical results
- **Linear scaling**: No speculation, no state synchronization

## Why HSV Succeeds Where Others Struggled

HPar needed speculative parallelization with rollback. Servo moved tokenization off-thread but kept DOM construction sequential. Both fight HTML's stateful parsing model.

HSV changes the question: instead of "how do we parallelize HTML parsing?" it asks "why use a format that requires sequential parsing?"

It's the difference between building a faster horse and building a car.

## References

### HPar (2013)
Zhijia Zhao, Michael Bebenita, Dave Herman, Jianhua Sun, and Xipeng Shen. "HPar: A practical parallel parser for HTML—taming HTML complexities for parallel parsing." *ACM Transactions on Architecture and Code Optimization (TACO)*, Vol. 10, No. 4, Article 44, December 2013.
https://research.csc.ncsu.edu/picture/publications/papers/taco14.pdf

### ZOOMM (2013)
Calin Cascaval, Seth Fowler, Pablo Montesinos-Ortego, Wayne Piekarski, Mehrdad Reshadi, Behnam Robatmili, Michael Weber, and Vrajesh Bhavsar. "ZOOMM: A parallel web browser engine for multicore mobile devices." *Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '13)*, February 2013.
https://dl.acm.org/doi/10.1145/2442516.2442543

### Servo (2017)
"Off main thread HTML parsing in Servo." *Servo Blog*, August 2017.
https://servo.org/blog/2017/08/23/gsoc-parsing/

### ParDOM (2011)
Wei Lu and Dennis Gannon. "A data parallel algorithm for XML DOM parsing." *Proceedings of the 2007 Workshop on Service-Oriented Computing Performance*.
https://www.researchgate.net/publication/221412394_A_data_parallel_algorithm_for_XML_DOM_parsing

## See Also

- [HSV Specification](https://hsvfile.com)
- [HTML in HSV](https://hsvfile.com/html.html)

---

**HSV** was created by [Danslav Slavenskoj](https://github.com/slavenskoj), [Lingenic LLC](https://lingenic.com), 2026.

Dedicated to the public domain under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/).
